ETSITS123 038V4.2.0 



(2001-03) 



Technical Specification 



Digital cellular telecommunications system (Phase 2+) (GSM); 

Universal Mobile Telecommunications System (UMTS); 

Alphabets and language-specific information 

(3GPP TS 23.038 version 4.2.0 Release 4) 



3Si^ 



GLOBAL SYSTEM FOR 
MOBILE COMMUNICATIONS 




3G PP TS 23.038 version 4.2.0 Release 4 1 ETSI TS 1 23 038 V4.2.0 (2001 -03) 



Reference 



RTS/TSGT-0223038Uv4 
Keywords 



GSM, UMTS 



ETSI 

650 Route des Lucioles 
F-06921 Sophia Antipolis Cedex - FRANCE 

Tel.: +33 4 92 94 42 00 Fax: +33 4 93 65 47 16 

Siret N°348 623 562 00017 - NAF 742 C 
Association a but non lucratif enregistree a la 
Sous-Prefecture de Grasse (06) N° 7803/88 



Important notice 



Individual copies of the present document can be downloaded from: 
http://www.etsi.orq 

The present document may be made available in more than one electronic version or in print. In any case of existing or 

perceived difference in contents between such versions, the reference version is the Portable Document Format (PDF). 

In case of dispute, the reference shall be the printing on ETSI printers of the PDF version kept on a specific network drive 

within ETSI Secretariat. 

Users of the present document should be aware that the document may be subject to revision or change of status. 
Information on the current status of this and other ETSI documents is available at http://www. etsi . o rq/tb/status/ 

If you find errors in the present document, send your comment to: 
editor@etsi.fr 

Copyright Notification 

No part may be reproduced except as authorized by written permission. 
The copyright and the foregoing restriction extend to reproduction in all media. 

© European Telecommunications Standards Institute 2001. 
All rights reserved. 



£75/ 



3G PP TS 23.038 version 4.2.0 Release 4 2 ETSI TS 1 23 038 V4.2.0 (2001-03) 



Intellectual Property Rights 



IPRs essential or potentially essential to the present document may have been declared to ETSI. The information 
pertaining to these essential IPRs, if any, is publicly available for ETSI members and non-members, and can be found 
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Foreword 

This Technical Specification (TS) has been produced by the ETSI 3' Generation Partnership Project (3GPP). 

The present document may refer to technical specifications or reports using their 3GPP identities, UMTS identities or 
GSM identities. These should be interpreted as being references to the corresponding ETSI deliverables. 

The cross reference between GSM, UMTS, 3GPP and ETSI identities can be found under www.etsi.org/kev . 
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Foreword 

This Technical Specification has been produced by the 3GPP. 

The contents of the present document are subject to continuing work within the TSG and may change following formal 
TSG approval. Should the TSG modify the contents of this TS, it will be re-released by the TSG with an identifying 
change of release date and an increase in version number as follows: 

Version x.y.z 

where: 

X the first digit: 

1 presented to TSG for information; 

2 presented to TSG for approval; 

3 Indicates TSG approved document under change control. 

y the second digit is incremented for all changes of substance, i.e. technical enhancements, corrections, 
updates, etc. 

z the third digit is incremented when editorial only changes have been incorporated in the specification; 
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1 Scope 



This TS defines the alphabets, languages and message handling requirements for SMS, CBS and USSD and may 
additionally be used for Man Machine Interface (MMI) (3GPP TS 22.030 [2]). 

The specification for the Data Circuit terminating Equipment/Data Terminal Equipment (DCE/DTE) interface (3GPP 
TS 27.005 [8]) will also use the codes specified herein for the transfer of SMS data to an external terminal. 



Normative references 



The following documents contain provisions which, through reference in this text, constitute provisions of the present 
document. 

• References are either specific (identified by date of publication, edition number, version number, etc.) or 
non-specific. 

• For a specific reference, subsequent revisions do not apply. 

• For a non-specific reference, the latest version applies. In the case of a reference to a 3GPP document (including 
a GSM document), a non-specific reference implicitly refers to the latest version of that document in the same 
Release as the present document. 

[I] GSM 01.04: "Digital cellular telecommunication system (Phase 2+); Abbreviations and 
acronyms". 

[2] 3GPP TS 22.030: "Man-Machine Interface (MMI) of the Mobile Station (MS)". 

[3] 3GPP TS 23.090: "Unstructured Supplementary Service Data (USSD) - Stage 2". 

[4] 3GPP TS 23 .040: "Technical reahzation of the Short Message Service (SMS) " . 

[5] 3GPP TS 23.041: "Technical reahzation of the Cell Broadcast Service (CBS)". 

[6] 3GPP TS 24.01 1: "Short Message Service (SMS) support on mobile radio interface". 

[7] 3GPP TS 24.012: "Cell Broadcast Service (CBS) support on the mobile radio interface". 

[8] 3GPP TS 27.005: "Use of Data Terminal Equipment - Data Circuit terminating Equipment (DTE ■ 

DCE) interface for Short Message Service (SMS) and Cell Broadcast Service (CBS)". 

[10] ISO/IEC10646: "Universal Multiple-Octet Coded Character Set (UCS)"; UCS2, 16 bit coding. 

[II] 3GPP TS 24.090: "Unstructured Supplementary Service Data (USSD) - Stage 3". 
[12] ISO 639 "Code for the representation of names of languages" 

[13] 3GPP TS 23.042: "Compression algorithm for text messaging services". 

[14] 3GPP TR 21.905: "3G Vocabulary" 

[15] "Wireless Datagram Protocol Specification", Wireless Application Protocol Forum Ltd. 

3 Abbreviations 

Abbreviations used in this TS are listed in GSM TR 01.04 [1] and 3GPP TR 21.905 [14]. 
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SMS Data Coding Scheme 



The TP -Data-Coding-Scheme field, defined in 3GPP TS 23.040 [4], indicates the data coding scheme of the TP-UD 
field, and may indicate a message class. Any reserved codings shall be assumed to be the GSM 7 bit default alphabet 
(the same as codepoint 00000000) by a receiving entity. The octet is used according to a coding group which is 
indicated in bits 7. .4. The octet is then coded as follows: 



Coding Group Bits 
7..4 


Useof bits3..0 


OOxx 


General Data Coding indication 
Bits 5..0 indicate the following: 

Bit 5, if set to 0, indicates the text is uncompressed 

Bit 5, if set to 1 , indicates the text is compressed using the compression algorithm defined 

in 3GPPTS 23.042 [13] 

Bit 4, if set to 0, indicates that bits 1 to are reserved and have no message class 

meaning 

Bit 4, if set to 1 , indicates that bits 1 to have a message class meaning:: 

Bit 1 Bit IVlessage Class 
Class 

1 Class 1 Default meaning: ME-specific. 

1 Class 2 (U)SIM specific message 

1 1 Class 3 Default meaning: IE specific (see 3GPPTS 27.005 [8]) 

Bits 3 and 2 indicate the alphabet being used, as follows : 

Bit 3 Bit2 Alphabet: 

GSM 7 bit default alphabet 

1 8 bit data 

1 UCS2(16bit)[10] 
1 1 Reserved 

NOTE: The special case of bits 7..0 being 0000 0000 indicates the GSM 7 bit default 
alphabet with no message class 


01 XX 


Message Marked for Automatic Deletion Group 

This group can be used by the SM originator to mark the message ( stored in the ME or 
(U)SIM ) for deletion after reading irrespective of the message class. 
The way the ME will process this deletion should be manufacturer specific but shall be 
done without the intervention of the End User or the targeted application.The mobile 
manfacturer may optionally provide a means for the user to prevent this automatic deletion. 

Bit 5..0 are coded exactly the same as Group OOxx 


1000.. 1011 


Reserved coding groups 


1100 


Message Waiting Indication Group: Discard Message 

The specification for this group is exactly the same as for Group 1 101 , except that: 

after presenting an indication and storing the status, the ME may discard the contents 
of the message. 

The ME shall be able to receive, process and acknowledge messages in this group, 
irrespective of memory availability for other types of short message. 


1101 


Message Waiting Indication Group: Store Message 

This Group defines an indication to be provided to the user about the status of types of 
message waiting on systems connected to the GSM/UMTS PLMN. The ME should present 
this indication as an icon on the screen, or other MMI indication. The ME shall update the 
contents of the Message Waiting Indication Status on the USIM (see 3GPP TS 31.102) 
when present or otherwise should store the status in the ME. The contents of the Message 
Waiting Indication Status should control the ME indicator. For each indication supported, 
the mobile may provide storage for the Origination Address. The ME may take note of the 
Origination Address for messages in this group and group 1 100. 

Text included in the user data is coded in the GSM 7 bit default alphabet. 
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Coding Group Bits 
7..4 


Useof bits3..0 




Where a message is received with bits 7. .4 set to 11 01 , the mobile shall store the text of 
the SMS message in addition to setting the indication. The indication setting should take 
place irrespective of memory availability to store the short message. 

Bits 3 indicates Indication Sense: 

Bits 

Set Indication Inactive 

1 Set Indication Active 

Bit 2 is reserved, and set to 

Bit 1 Bit Indication Type: 

Voicemail Message Waiting 

1 Fax Message Waiting 

1 Electronic Mail Message Waiting 
1 1 Other Message Waiting* 

* Mobile manufacturers may implement the "Other Message Waiting" indication as an 
additional indication without specifying the meaning. The meaning of this indication is 
intended to be standardized in the future, so Operators should not make use of this 
indication until the standard for this indication is finalized. 


1110 


Message Waiting Indication Group: Store Message 

The coding of bits 3..0 and functionality of this feature are the same as for the Message 
Waiting Indication Group above, (bits 7. .4 set to 1 101) with the exception that the text 
included in the user data is coded in the uncompressed UCS2 alphabet. 


1111 


Data coding/message class 
Bit 3 is reserved, set to 0. 

Bit 2 Message coding: 

GSM 7 bit default alphabet 

1 8-bit data 

Bit 1 Bit Message Class: 
Class 

1 Class 1 default meaning: ME-specific. 

1 Class 2 (U)SIM-specific message. 

1 1 Class 3 default meaning: TE specific (see 3GPPTS 27.005 [8]) 



GSM 7 bit default alphabet indicates that the TP-UD is coded from the GSM 7 bit default alphabet given in 
subclause 6.2.1. When this alphabet is used, the characters of the message are packed in octets as shown in 
subclause 6.1.2.1.1, and the message can consist of up to 160 characters. The GSM 7 bit default alphabet shall be 
supported by all MSs and SCs offering the service. If the GSM 7 bit default alphabet extension mechanism is used then 
the number of displayable characters will reduce by one for every instance where the GSM 7 bit default alphabet 
extension table is used 8-bit data indicates that the TP-UD has user-defmed coding, and the message can consist of up to 
140 octets. 

UCS2 alphabet indicates that the TP-UD has a UCS2 [10] coded message, and the message can consist of up to 140 
octets, i.e. up to 70 UCS2 characters. The General notes specified in subclause 6.1.1 override any contrary specification 
in UCS2, so for example even in UCS2 a <CR> character will cause the MS to return to the beginning of the current 
line and overwrite any existing text with the characters which follow the <CR>. 

When a message is compressed, the TP-UD consists of the GSM 7 bit default alphabet or UCS2 alphabet compressed 
message, and the compressed message itself can consist of up to 140 octets in total. 

When a mobile terminated message is class and the MS has the capability of displaying short messages, the MS shall 
display the message immediately and send an acknowledgement to the SC when the message has successfully reached 
the MS irrespective of whether there is memory available in the (U)SIM or ME. The message shall not be automatically 
stored in the (U)SIM or ME. 

The ME may make provision through MMI for the user to selectively prevent the message from being displayed 
immediately. 
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If the ME is incapable of displaying short messages or if the immediate display of the message has been disabled 
through MMI then the ME shall treat the short message as though there was no message class, i.e. it will ignore bits 
and 1 in the TP-DCS and normal rules for memory capacity exceeded shall apply. 

When a mobile terminated message is Class 1, the MS shall send an acknowledgement to the SC when the message has 
successfully reached the MS and can be stored. The MS shall normally store the message in the ME by default, if that is 
possible, but otherwise the message may be stored elsewhere, e.g. in the (U)SIM. The user may be able to override the 
default meaning and select their own routing. 

When a mobile terminated message is Class 2 ((U)SIM-specific), an MS shall ensure that the message has been 
transferred to the SMS data field in the (U)SIM before sending an acknowledgement to the SC. The MS shall return a 
"protocol error, unspecified" error message (see 3GPP TS 24.01 1 [6]) if the short message cannot be stored in the 
(U)SIM and there is other short message storage available at the MS. If all the short message storage at the MS is 
already in use, the MS shall return "memory capacity exceeded". This behaviour applies in all cases except for an MS 
supporting (U)SIM Application Toolkit when the Protocol Identifier (TP-PID) of the mobile terminated message is set 
to "(U)SIM Data download" (see 3GPP TS 23.040 [4]) 

When a mobile terminated message is Class 3, the MS shall send an acknowledgement to the SC when the message has 
successfully reached the MS and can be stored, irrespectively of whether the MS supports an SMS interface to a TE, 
and without waiting for the message to be transferred to the TE. Thus the acknowledgement to the SC of a TE-specific 
message does not imply that the message has reached the TE. Class 3 messages shall normally be transferred to the TE 
when the TE requests "TE-specific" messages (see 3GPP TS 27.005 [8]). The user may be able to override the default 
meaning and select their own routing. 

The message class codes may also be used for mobile originated messages, to provide an indication to the destination 
SME of how the message was handled at the MS. 

The MS will not interpret reserved or unsupported values but shall store them as received. The SC may reject messages 
with a Data Coding Scheme containing a reserved value or one which is not supported. 
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CBS Data Coding Scheme 



The CBS Data Coding Scheme indicates the intended handUng of the message at the MS, the alphabet/coding, and the 
language (when appHcable). Any reserved codings shall be assumed to be the GSM 7 bit default alphabet (the same as 
codepoint 00001 111) by a receiving entity. The octet is used according to a coding group which is indicated in bits 7. .4. 
The octet is then coded as follows: 



Coding Group 

Bits 

7..4 


Useof bits3..0 


0000 


Language using the GSIVI 7 bit default alphabet 

Bits 3..0 indicate the language: 

0000 German 

0001 English 

0010 Italian 

001 1 French 

0100 Spanish 

0101 Dutch 

0110 Swedish 

0111 Danish 

1000 Portuguese 

1001 Finnish 

1010 Norwegian 

1011 Greek 

1100 Turkish 

1101 Hungarian 

1110 Polish 

1111 Language unspecified 


0001 


0000 GSIVI 7 bit default alphabet; message preceded by language indication. 

The first 3 characters of the message are a two-character representation of the 
language encoded according to ISO 639 [12], followed by a CR character. The 
CR character is then followed by 90 characters of text. 

0001 UCS2; message preceded by language indication 

The message starts with a two 7-bit default alphabet character representation of 
the language encoded according to ISO 639 [12]. This is padded to the octet 
boundary with two bits set to and then followed by 40 characters of UCS2- 
encoded message. 

An IVIS not supporting UCS2 coding will present the two character language 
identifier followed by improperly interpreted user data. 

0010..1111 Reserved 


0010.. 


0000 Czech 

0001 Hebrew 

0010 Arabic 

0011 Russian 

0100 Icelandic 

01 01 ..1111 Reserved for other languages using the GSM 7 bit default alphabet, with 

unspecified handling at the IVIS 


0011 


0000. .1 1 1 1 Reserved for other languages using the GSM 7 bit default alphabet, with 
unspecified handling at the MS 


01 XX 


General Data Coding indication 
Bits 5..0 indicate the following: 

Bit 5, if set to 0, indicates the text is uncompressed 

Bit 5, if set to 1 , indicates the text is compressed using the compression algorithm defined in 

3GPPTS 23.042 [13] 

Bit 4, if set to 0, indicates that bits 1 to are reserved and have no message class meaning 
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Coding Group 

Bits 

7..4 


Useof bits3..0 




Bit 4, if set to 1 , indicates tliat bits 1 to liave a message class meaning: 

Bit 1 Bit Message Class: 
Class 

1 Class 1 Default meaning: ME-specific. 

1 Class 2 (U)SIM specific message. 

1 1 Class 3 Default meaning: TE-specific (see 3GPP TS 27.005 [8]) 

Bits 3 and 2 indicate the alphabet being used, as follows: 

Bit 3 Bit 2 Alphabet: 

GSM 7 bit default alphabet 

1 8 bit data 

1 USC2(16bit)[10] 
1 1 Reserved 


1000.. 1101 


Reserved coding groups 


1110 


Defined by the WAP Forum [15] 


1111 


Data coding / message handling 
Bit 3 is reserved, set to 0. 

Bit 2 Message coding: 

GSM 7 bit default alphabet 

1 8 bit data 

Bit 1 Bit Message Class: 
No message class. 

1 Class 1 user defined. 

1 Class 2 user defined. 
1 1 Class 3 

default meaning: IE specific 
(see 3GPP TS 27.005 [8]) 



These codings may also be used for USSD and MMI/display purposes. 

See 3GPP TS 24.090 [1 1] for specific coding values applicable to USSD for MS originated USSD messages and MS 
terminated USSD messages. USSD messages using the default alphabet are coded with the GSM 7-bit default alphabet 
given in subclause 6.2.1. The message can then consist of up to 182 user characters. 

Cell Broadcast messages using the default alphabet are coded with the GSM 7-bit default alphabet given in 
subclause 6.2.1. The message then consists of 93 user characters. 

If the GSM 7 bit default alphabet extension mechanism is used then the number of displayable characters will reduce by 
one for every instance where the GSM 7 bit default alphabet extension table is usedCell Broadcast messages using 8-bit 
data have user-defined coding, and will be 82 octets in length. 

UCS2 alphabet indicates that the message is coded in UCS2 [10]. The General notes specified in subclause 6.1.1 
override any contrary specification in UCS2, so for example even in UCS2 a <CR> character will cause the MS to 
return to the beginning of the current line and overwrite any existing text with the characters which follow the <CR>. 
Messages encoded in UCS2 consist of 41 characters. 

Class 1 and Class 2 messages may be routed by the ME to user-defined destinations, but the user may override any 
default meaning and select their own routing. 

Class 3 messages will normally be selected for transfer to a TE, in cases where a ME supports an SMS/CBS interface to 
a TE, and the TE requests "TE-specific" cell broadcast messages (see 3GPP TS 27.005 [8]). The user may be able to 
override the default meaning and select their own routing. 
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6 Individual parameters 

6.1 General principles 

6.1.1 General notes 

Except where otherwise indicated, the following shall apply to all alphabet tables: 

1: The characters marked "!)" are not used but are displayed as a space. 

2: The characters of this set, when displayed, should approximate to the appearance of the relevant characters 
specified in ISO 1073 and the relevant national standards. 

3: Control characters: 

Code Meaning 

LF Line feed: Any characters following LF which are to be displayed shall be presented as the next 

line of the message, commencing with the first character position. 

CR Carriage return: Any characters following CR which are to be displayed shall be presented as the 

current line of the message, commencing with the first character position. 

SP Space character. 

4: The display of characters within a message is achieved by taking each character in turn and placing it in the next 
available space from left to right and top to bottom. 

6.1 .2 Character packing 
6.1.2.1 SMS Packing 

6.1.2.1.1 Packing of 7-bit characters 

If a character number a is noted in the following way: 

b7 b6 b5 b4 b3 b2 bl 
aa ab ac ad ae af ag 
The packing of the 7-bitscharacters in octets is done by completing the octets with zeros on the left. 

For examples, packing: a 

one character in one octet: 

bits number: 

7 6 5 4 3 2 10 
la lb Ic Id le If Ig 

two characters in two octets: 

bits number: 

7 6 5 4 3 2 10 
2g la lb Ic Id le If Ig 
2a 2b 2c 2d 2e 2f 

three characters in three octets: 
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bits number: 



7 6 5 4 3 2 10 
2g la lb Ic Id le If Ig 
3f 3g 2a 2b 2c 2d 2e 2f 
3a 3b 3c 3d 3e 



seven characters in seven octets: 



bits number: 



7 6 5 4 3 2 10 
2g la lb Ic Id le If Ig 
3f 3g 2a 2b 2c 2d 2e 2f 
4e 4f 4g 3a 3b 3c 3d 3e 
5d 5e 5f 5g 4a 4b 4c 4d 
6c 6d 6e 6f 6g 5a 5b 5c 
7b 7c 7d 7e 7f 7g 6a 6b 
0000000 7a 

eight characters in seven octets: 

bits number: 

7 6 5 4 3 2 10 
2g la lb Ic Id le If Ig 
3f 3g 2a 2b 2c 2d 2e 2f 
4e 4f 4g 3a 3b 3c 3d 3e 
5d 5e 5f 5g 4a 4b 4c 4d 
6c 6d 6e 6f 6g 5a 5b 5c 
7b 7c 7d 7e 7f 7g 6a 6b 
8a 8b 8c 8d 8e 8f 8g 7a 

The bit number zero is always transmitted first. 

Therefore, in 140 octets, it is possible to pack (140x8)/7=160 characters. 



6.1.2.2 



CBS Packing 



6.1 .2.2.1 Packing of 7-bit characters 

If a character number a is noted in the following way: 

b7 b6 b5 b4 b3 b2 bl 
aa ab ac ad ae af ag 

the packing of the 7-bits characters in octets is done as follows: 
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bit number 



octet number 



7 6 5 4 3 2 10 



1 2g la lb Ic Id le If Ig 

2 3f 3g 2a 2b 2c 2d 2e 2f 

3 4e 4f 4g 3a 3b 3c 3d 3e 

4 5d 5e 5f 5g 4a 4b 4c 4d 

5 6c 6d 6e 6f 6g 5a 5b 5c 

6 7b 7c 7d 7e 7f 7g 6a 6b 

7 8a 8b 8c 8d 8e 8f 8g 7a 

8 lOg 9a 9b 9c 9d 9e 9f 9g 

81 93d 93e 93f93g 92a 92b 92c 92d 

82 93a 93b 93c 

The bit number zero is always transmitted first. 

Therefore, in 82 octets, it is possible to pack (82x8)/7 = 93.7, that is 93 characters. The 5 remaining bits are set to zero 
as stated above. 

6.1.2.3 USSD packing 

6.1 .2.3.1 Packing of 7 bit characters 

If a character number a is noted in the following way: 

b7 b6 b5 b4 b3 b2 bl 
aa ab ac ad ae af ag 
The packing of the 7-bit characters in octets is done by completing the octets with zeros on the left. 

For example, packing: a 

one character in one octet: 

bits number: 

7 6 5 4 3 2 10 
la lb Ic Id le If Ig 

two characters in two octets: 

bits number: 

7 6 5 4 3 2 10 
2g la lb Ic Id le If Ig 
2a 2b 2c 2d 2e 2f 

three characters in three octets: 

bits number: 

7 6 5 4 3 2 10 
2g la lb Ic Id le If Ig 
3f 3g 2a 2b 2c 2d 2e 2f 
3a 3b 3c 3d 3e 



£75/ 



3GPP TS 23.038 version 4.2.0 Release 4 



14 



ETSI TS 123 038 V4.2.0 (2001-03) 



six characters in six octets: 



bits number: 



7 6 5 4 3 2 10 
2g la lb Ic Id le If Ig 
3f 3g 2a 2b 2c 2d 2e 2f 
4e 4f 4g 3a 3b 3c 3d 3e 
5d 5e 5f 5g 4a 4b 4c 4d 
6c 6d 6e 6f 6g 5a 5b 5c 
6a 6b 



seven characters in seven octets: 

bits number: 

7 6 5 4 3 2 10 
2g la lb Ic Id le If Ig 
3f 3g 2a 2b 2c 2d 2e 2f 
4e 4f 4g 3a 3b 3c 3d 3e 
5d 5e 5f 5g 4a 4b 4c 4d 
6c 6d 6e 6f 6g 5a 5b 5c 
7b 7c 7d 7e 7f 7g 6a 6b 
1 1 1 7a 

The bit number zero is always transmitted first. 

eight characters in seven octets: 

bits number: 

7 6 5 4 3 2 10 
2g la lb Ic Id le If Ig 
3f 3g 2a 2b 2c 2d 2e 2f 
4e 4f 4g 3a 3b 3c 3d 3e 
5d 5e 5f 5g 4a 4b 4c 4d 
6c 6d 6e 6f 6g 5a 5b 5c 
7b 7c 7d 7e 7f 7g 6a 6b 
8a 8b 8c 8d 8e 8f 8g 7a 

nine characters in eight octets: 

bits number: 

7 6 5 4 3 2 10 
2g la lb Ic Id le If Ig 
3f 3g 2a 2b 2c 2d 2e 2f 
4e 4f 4g 3a 3b 3c 3d 3e 
5d 5e 5f 5g 4a 4b 4c 4d 
6c 6d 6e 6f 6g 5a 5b 5c 
7b 7c 7d 7e 7f 7g 6a 6b 
8a 8b 8c 8d 8e 8f 8g 7a 
9a 9b 9c 9d 9e 9f 9g 
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lOd 


lOe 


lOf 


lie 


lid 


lie 


12b 


12c 


12d 
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fifteen characters in fourteen octets: 

bits number: 

7 6 5 4 3 2 10 
2g la lb Ic Id le If Ig 
3f 3g 2a 2b 2c 2d 2e 2f 
4e 4f 4g 3a 3b 3c 3d 3e 
5d 5e 5f 5g 4a 4b 4c 4d 
6c 6d 6e 6f 6g 5a 5b 5c 
7b 7c 7d 7e 7f 7g 6a 6b 
8a 8b 8c 8d 8e 8f 8g 7a 
lOg 9a 9b 9c 9d 9e 9f 9g 
llfllg 10a 10b 10c 
12e 12fl2g 11a lib 
13d 13e 13fl3g 12a 
14c 14d 14e 14fl4g 
15b 15c 15d 15e 1 
1 1 1 15a 

sixteen characters in fourteen octets: 

bits number: 

7 6 5 4 3 2 10 

2g la lb Ic Id le If Ig 

3f 3g 2a 2b 2c 2d 2e 2f 

4e 4f 4g 3a 3b 3c 3d 3e 

5d 5e 5f 5g 4a 4b 4c 4d 

6c 6d 6e 6f 6g 5a 5b 5c 

7b 7c 7d 7e 7f 7g 6a 6b 

8a 8b 8c 8d 8e 8f 8g 7a 

lOg 9a 9b 9c 9d 9e 9f 9g 

llfllg 10a 10b 10c lOd lOe lOf 

12e 12fl2g 11a lib lie lid lie 

13d 13e 13fl3g 12a 12b 12c 12d 

14c 14d 14e 14fl4g 13a 13b 13c 

15b 15c 15d 15e 15fl5g 14a 14b 

16a 16b 16c 16d 16e 16fl6g 15a 

The bit number zero is always transmitted first. 

Therefore, in 160 octets, is it possible to pack (160*8)/7 = 182.8, that is 182 characters. The remaining 6 bits are set to 
zero as stated above. 

Packing of 7 bit characters in USSD strings is done in the same way as for SMS (subclause 7.1.2. 1). The character 
stream is bit padded to octet boundary with binary zeroes as shown above. 

If the total number of characters to be sent equals (8n-l) where n=l,2,3 etc. then there are 7 spare bits at the end of the 
message. To avoid the situation where the receiving entity confuses 7 binary zero pad bits as the @ character, the 
carriage return or <CR> character (defined in subclause 7.1.1) shall be used for padding in this situation, just as for Cell 
Broadcast. 

If <CR> is intended to be the last character and the message (including the wanted <CR>) ends on an octet boundary, 
then another <CR> must be added together with a padding bit 0. The receiving entity will perform the carriage return 
function twice, but this will not result in misoperation as the definition of <CR> in subclause 7.1.1 is identical to the 
definition of <CR><CR>. 

The receiving entity shall remove the final <CR> character where the message ends on an octet boundary with <CR> as 
the last character. 
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6.2 Alphabet tables 



This section provides tables for all the alphabets to be supported by SMS, CBS and USSD. The GSM 7 bit default 
alphabet is mandatory. Additional alphabets are optional. Irrespective of support of an individual alphabet, a MS shall 
have the ability to store a short message coded in any alphabet on the (U)SIM. 

6.2.1 GSM 7 bit Default Alphabet 

Bits per character: 7 

CBS/USSD pad character: CR 

Character table: 











b7 














1 


1 


1 


1 




b6 








1 


1 








1 


1 




b5 





1 





1 





1 





1 


b4 


b3 


b2 


bl 







1 


2 


3 


4 


5 


6 


7 

















@ 


A 


SP 





i 


P 


6 


P 











1 


1 


£ 





1 


1 


A 


Q 


a 


q 








1 





2 


$ 


$ 


II 


2 


B 


R 


b 


r 








1 


1 


3 


¥ 


r 


# 


3 


C 


S 


c 


s 





1 








4 


e 


A 


n 


4 


D 


T 


d 


t 





1 





1 


5 


e 


a 


% 


5 


E 


U 


e 


u 





1 


1 





6 


u 


n 


& 


6 


F 


V 


f 


V 





1 


1 


1 


7 


i 


^ 


! 


7 


G 


W 


g 


w 













8 


6 


z 


( 


8 


H 


X 


h 


X 










1 


9 


Q 





) 


9 


I 


Y 


i 


Y 







1 





10 


LF 


M 


* 


: 


J 


Z 


J 


z 







1 


1 


11 





1) 


+ 


r 


K 


A 


k 


a 




1 








12 





E 


r 


< 


L 


6 


1 


o 




1 





1 


13 


CR 


ffi 


- 


= 


M 


N 


m 


n 




1 


1 





14 


A 


B 


• 


> 


N 


U 


n 


ii 




1 


1 


1 


15 


a 


E 


1 


7 





§ 


o 


a 



This code is an escape to an extension of the GSIVI 7 bit default alphabet table. A receiving entity which does not 
understand the meaning of this escape mechanism shall display it as a space character. 
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6.2.1.1 



GSM 7 bit default alphabet extension table 
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1 

















1 


1 
























1 





2 
























1 


1 


3 





















1 








4 




^ 

















1 





1 


5 














2) 







1 


1 





6 





















1 


1 


1 


7 





























8 






{ 




















1 


9 






} 

















1 





10 


3) 





















1 


1 


11 




1) 
















1 








12 








[ 












1 





1 


13 








^ 












1 


1 





14 








] 












1 


1 


1 


15 






\ 













In the event that an MS receives a code where a symbol is not represented in the above table then the MS shall display 
the character shown in the main GSM 7 bit default alphabet table in section 6.2.1 

1 ) This code value is reserved for the extension to another extension table. On receipt of this code, a receiving 

entity shall display a space until another extension table is defined. 

2 ) This code represents the EURO currency symbol. The code value is that used for the character 'e'. Therefore 

a receiving entity which is incapable of displaying the EURO currency symbol will display the character 'e' 
instead. 

3 ) This code is defined as a Page Break character and may be used for example in compressed CBS messages. 

Any mobile which does not understand the GSM 7 bit default alphabet table extension mechanism will treat 
this character as Line Feed 

6.2.2 8 bit data 

8 bit data is user defined 

Padding: CR in the case of an 8 bit character set 

Otherwise - user defined 
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Character table: 



User Specific 



6.2.3 UCS2 

Bits per character: 16 

CBS/USSD pad character: CR 

Character table: ISO/IEC10646 [10 ] 
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